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Summary 

Motivation: Spontaneous adverse event reports have an high potential for detecting adverse drug reactions. 
However, due to their dimension, the analysis of such databases requires statistical methods. In this context, 
disproportionality measures can be used. Their main idea is to project the data onto contingency tables in 
order to measure the strength of associations between drugs and adverse events. However, due to the data 
projection, these methods are sensitive to the problem of co-prescriptions and masking effects. Recently, lo¬ 
gistic regressions have been used with a Lasso type penalty to perform the detection of associations between 
drugs and adverse events. On different examples, this approach limits the drawbacks of the disproportional¬ 
ity methods, but the choice of the penalty value is open to criticism while it strongly influences the results. 
Results: In this paper, we propose to use a logistic regression whose sparsity is viewed as a model selection 
challenge. Since the model space is huge, a Metropolis-Hastings algorithm carries out the model selection 
by maximizing the BIC criterion. Thus, we avoid the calibration of penalty or threshold. During our ap¬ 
plication on the French pharmacovigilance database, the proposed method is compared to well established 
approaches on a reference data set, and obtains better rates of positive and negative controls. However, many 
signals (i.e. specific drug-event associations) are not detected by the proposed method. So, we conclude 
that this method should be used in parallel to existing measures in pharmacovigilance. 

Availability: Code implementing the proposed method is available in R on request from the correspond¬ 
ing author. 


Key words: Binary data, logistic regression, Metropolis-Hastings algorithm, model selection, pharmacovig¬ 
ilance, spontaneous reporting. 


1 Introduction 


To obtain approval, drugs go through many premarket safety tests, but adverse drug reactions may not be 
detected during these experiments. Many national or international regulatory agencies have thus introduced 
pharmacovigilance systems collecting spontaneously reported adverse events. Post-approval drug safety 
surveillance relies on these reported cases for suspecting that some drugs induce adverse events. They pro¬ 
vide huge binary databases that describe each individual by its drug consumption and its adverse events. 
Although spontaneous reporting systems suffer from many biases ( Almenoff et al. , l2007h. they have per¬ 


mitted early identification of associations between drugs and adverse events (Szarfman et all 2002). In 


order to assist pharmacovigilance experts in managing such databases, statistical methods aiming to put 
the light on unexpected associations have been proposed. 

The most classical methods are based on disproportionality measures and use data projections onto con¬ 


tingency tables. Among the m, the most popular are : the P roportional Reporting Ratio (lEvans et a/,ll2001l) 


the Repo rting Odds Ratio (IVan Puiienbroek et ali 12002), the Bayesian Confiden ce Propagation Neural 
Network (Ba te et al. L 1998 ) and the Gamma Poisson Shrinkage (DuMouchel 19991) . All of these methods 
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use a specific statistic which requires a threshold for detecting associations between drugs and adverse 
events. The disproportionality measure is computed for each drug-event pair in the database and compared 
to the threshold. Moreover, the data projections onto the contingency tables provide good computational 
performances. However, these projections involve some weakness against the problems of co-prescriptions 
and masking effects from highly reported associations for some drugs ( Caster et al. . 2010 ). None of these 
methods is defined as the reference approach. Due to the shortage of the gold standard sets, their compari¬ 
son remains a challenging issue. 


The shrinkage logistic regression is an interesting alternative to the methods based on data projections 
onto contingency tables. In this spirit. Caster et al. 1 120101) propose to model the probability of an adverse 
event conditionally on the drug consumptions by a sparse logistic regression whose sparsity is imposed 
by a Lasso type penalty ( Tibshirani . 1996 ). In this context, drug j and adverse event h are claimed to be 
associated when the coefficient related to drug j in the regression of adverse event h is strictly positive — 
since, in this case, the adverse event occurs more often with the consumption of this drug. However, the 
choice of the penalty value is a crucial and very difficult task. Indeed, the penalty value directly influences 
the signal detection. Caster et al. ( 20101) propose to use the same penalty for all the regressions. Moreover, 
they set the penalty value in order to obtain the same number of signals as a disproportionality method. 
A more rigorous method, but more computationally demanding, could consist in setting the penalty value 
by cross-validation where the penalty is set for minimizing the misclassification error. However, as shown 
during our numerical application, this approach obtains poor results notably due to the database sparsity. 
Recently, Harpaz et al. ( 20131) have used a full logistic regression in a two-step procedure where the first 
step consists in empirically selecting a subset of candidate drugs. 


In this paper, the signal detection is performed by a model selection step which avoids the use of any 
threshold or the calibration of the penalty. In this context, a model of a logistic regression determines 
the coefficients which are not zero. In a Bayesian framework, the best model has the highest posterior 
probability but this amount is not explicit. It is also useful to approximate its logarithm by the Bayesian In¬ 
formation Criterion ( Schwar j. 19781) . Therefore, the signal detection consists in selecting the model which 
maximizes the BIC criterion. Unfortunately, the number of competing models is too huge for applying an 
exhaustive approach which computes the BIC criterion for each competing _model. Therefore, the model 
selection is carried out by a Metropolis-Hastings algorithm ((Robert and Casella . 2004) which performs a 
random walk through the models of interest. This algorithm is classically used for finding the maximum 
of a function even on a discrete space. In our context, the mode of its stationary distribution corresponds 
to the model maximizing the BIC criterion. Thus, we were able to develop an efficient algorithm by taking 
advantage of some features of the data. 


In this paper, we compare our model-based procedure to the four disproportionality methods imple¬ 
mented in the R package PhViD and to the Lasso logistic regression implemented in the R package glmnet. 
We use the database arisen from the French pharmacovigilance which received roughly 20,000 suspected 
adverse drug reactions per year from 2000 to 2010. Comparison between pharmacovigilance procedures is 
a difficult task. In this paper, we focus on the four adverse events described in the Observational Medical 
Outcomes Partnership (OMOP) reference set (Ryan et al., 2013) and on their 145 relating drugs. To our 
knowledge, it is the only reference set recently formed with positive and negative controls to address the 
issue of methods assessment in pharmacovigilance. 


This article is organised as follows. Section[2]presents the parsimonious version of the logistic regres¬ 
sion. Section 0 introduces the Metropolis-Hastings algorithm devoted to the model selection. Section |4] 
compares the proposed method to four disproportionality methods and to the Lasso logistic regression. 
Section[5]discusses the limitations and scope of the proposed approach. 
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2 Parsimonious logistic regression 

2.1 Spontaneous reporting database 

Spontaneous reporting databases describe n individuals by their consumptions of p drugs and by the pres¬ 
ence or absence of d adverse events. For the purpose of logistic regression, in this article, we consider one 
adverse event at a time that we denote by the binary vector y = (yi,..., y n ) £ B" where B = {0,1}. 
More specifically, yi = 1 if individual i suffers from this adverse event and yi = 0 otherwise. In the 
regression context, explanatory variables x = (ati,..., x n ) indicate the presence or the absence of drug 
consumptions. Binary vector Xi = (xu ,..., x rp ) £ W indicates the drug consumption of individual i 
since Xij = 1 if individual i takes drug j and Xij = 0 otherwise. 


2.2 Logistic regression 


The probability of the adverse event given the drug consumption is assumed to follow a logit regression. 
Model 7 = ( 72 ,... , 7 p ) £ W defines which drugs influence the appearance of the adverse event, since 
7 j = 1 if the coefficient of the regression related to drug j is unconstrained (i.e. defined on R) while 
7 j = 0 if this coefficient is zero. The indices of the drugs having a non-zero (respectively zero) coefficient 
are grouped into the set = {j : 7y = 1 } (respectively V ^ = {j : 7 j = 0 }). 

For model 7 , the logit relationship is 


h P(y i = l\x i ,'y,f3) 

n = 1 | Xi,j,f3) 


/3o+ E 

jev-, 


(1) 


/3 = (/3o, /3i,, (ip) £ Fi-y being the vector of regression coefficients for which many coefficients are 
constrained by 7 to be zero, since 

n-y = {P G K p+1 : Vj £ V( v 0.j = 0} . (2) 

Thus, the drugs suspected to induce the adverse event are those belonging to V~, and having a positive 
coefficient in the regression (i.e. 0j > 0 ). 

Assuming that spontaneous reports consists of n i.i.d. observations, the adverse event log-likelihood 
related to model 7 is written as 


L(y I x,7,/3) = 'Y^y i (0 o 

i= 1 


E 


PjXij 


) - In 1 + exp (/3 0 + ^ 


ie-D-, 


PjXij) 


(3) 


Obviously, the indices of x, impacting the log-likelihood value are those belonging to V~ r In practice, 
it is often more numerically efficient to compute the adverse event log-likelihood by using the unique 
profiles of observations impacting the likelihood. This weighted form of the log-likelihood is described in 
Appendix A. 

From the database, the Maximum Likelihood Estimates (MLE) f3 is defined by 


= argmax j g gn ^4(y | x,7,/3). (4) 

To assess ©. we need to solve the derivative likelihood equations using the classical Newton-Raphson 
method (see Nocedal and Wright (120061) ). However, the MLE is w ell defined only if the ove rlapping 
conditions of Silvapulle (1981) are satisfied (see also the discussion of Owen and Roedigei ( 201 4) ). Thus, 
for the binary variables, the MLE is well defined only if 


V(/i y , h x ) £ B 2 , Vj £ 3 i £ l hy : x^ = h x , (5) 

where 1h y = {i : yi = h y }. In a few words, © is equivalent to have at least one absence and one presence 
of drug consumption in both sets {x^ | y t = 0} and { x^ \ yi = 1}. To ensure that the MLE is well 
defined, this condition suggests us to do not take into account drugs that do not satisfy it. 


Copyright line will be provided by the publisher 

























4 


M. Marbac, P. Tubert-Bitter, and M. Sedki: 


3 Model selection by MCMC algorithm 

3.1 Bayesian model selection 

We define the set of the competing models T as the set of models 7 € where (0 is satisfied. So, 

r = {7 g B p such as 0 is satisfied for 7 }. ( 6 ) 

In a Bayesian framework, the aim is to obtain the model having the highest posterior distribution p (7 | 
y, x). We assume that uniformity holds for the prior distribution p( 7 | x) of models 7 £ T. So, we have 

P(l I y,x) ocp(y | x, 7 ), (7) 

where p(y | x, 7 ) is the integrated likelihood defined by 


p{ y|x,7)=/ P{y I x,7,/3)p(^ I x,7)d/3, (8) 

where p(y | x, 7 , f3) = exp (f n (y | x, 7 , /3)) is the likelihood related to model 7 and where p(f3 | x, 7 ) is 
the prior distribution of [3 whose the support is included in Since logarithm is monotone, 

argmax Tgr p (7 | y,x) = argmax^p lnp(y | x, 7 ). (9) 


When the integrated likelihood has not a closed form, the Bayesian Information Criterion (BIC) is 
generally used. It i s based on a second degree Laplace approximation of the logarithm of the integrated 
likelihood ( Schwar j . 1978 ). and it is defined as 


BIC( 7 ) = £„(y | x, 7 , 3 7 ) --ylnn, (10) 

where = 1 + 7 j ' s the degree of freedom for model 7 . Therefore, we want to achieve 7 ' which 

is the model maximizing the BIC criterion, so 


7 * = argmax 7 gr BIC( 7 ). (11) 

This criterion selects the model providing the best trade-off between its accuracy related to the data and its 
complexity. 

Obviously, the number of competing models is too huge for applying an exhaustive approach ( i.e . to 
compute the BIC criterion for each model). Therefore, the Metropolis-Hastings algorithm described in the 
following section is used to estimate 7 *. 


3.2 Metropolis-Hastings algorithm for achieving 7 * 

Model 7 * can be achieved through a Metropolis-Hastings algorithm ( Robert and Casella . 20041) , described 
in AlgorithmQ] which performs a random walk over T. The unique invariant distribution of Algorithm!]] 
is proportional to exp (BIC( 7 )). Therefore, 7 * is the mode of its stationary distribution. 

At each iteration, the algorithm proposes to move into a neighbourhood of the current model. A neigh¬ 
bouring model is defined as copy of the current model where just a few elements are altered. Thus, at 
iteration [r], the candidate 7 is equal to the current model 'y' T except for a > 1 elements at the maximum. 
More specifically, 7 is uniformly sampled in V a (p( M) where 


V a (l [r] ) = 


7 : 5Z l7f ^ I < 

3 =1 


a 


( 12 ) 
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In the application, we set a = 5 to obtain good mixing properties. The candidate 7 is accepted with a 
probability equal to 


exp (BIC( 7 )) 
exp (BIC( 7 M))' 


(13) 


Note that we define that BIC( 7 ) = —00 for all 7 £ l p \ T. This algorithm performs R iterations and 
returns the model maximizing the BIC criterion. In practice, there may be almost absorbing states, so 
different initialisations of this algorithm ensure to visit 7 *. 


Algorithm 1 Metropolis-Hasting performing the model selection 

Initialisation 7M is uniformly sampled in T. 

For r = 1,..., R. 

Candidate step: 7 is uniformly sampled in V a ( 7 M). 
Acceptance/reject step: defined 7 M with 


7 


7 with probability pM 
7 b'- 1 ] otherwise 


End For 

Return arg max r=x R BIC (7 ^). 


4 Results on real data set 

In this section, after presenting the French pharmacovigilance database, the proposed method is compared 
to the others by using the OMOP set. Finally, specific comments are given for the proposed method. 


4.1 Data 


To evaluate and compare the performances of the competing methods, we use the OMOP (Ryan et al. 


2013) reference set of test cases that contains both positive and negative controls. Four adverse events 


(i.e. d = 4) were studied in this reference set : acute myocardial infarction (AMI), acute kidney injury 
(AKI), acute liver injury (ALI), and upper gastro-intestinal bleeding (GIB). There are three-hundred and 
ninety-nine test cases where 165 positive controls and 234 negative controls were identified across the 
four adverse events of interest. More details are given by Table [Q Ryan et al. (2013) indicate that the 
majority of positive controls for AKI and GIB were supported by randomized clinical trial evidence, while 
the majority of positive controls for ALI and AMI were only based on published case reports. 


Table 1 Numbers of positive and negative controls for the four adverse event in the OMOP reference set. 


control 

AMI 

GIB 

ALI 

AKI 

positive 

36 

24 

81 

24 

negative 

66 

67 

37 

64 


Methods are compared on the data extracted from the French pharmacovigilance database where noti¬ 
fications have been collected from 2000 to 2010. The studied database contains n = 219,340 individuals 
notifications and the consumption informations concerning p = 145 drugs mentioned on the OMOP ref¬ 
erence set. Therefore, 145 x 4 = 580 drug-event pairs are studied, among them 145 are positive controls 
(25%), 153 are negative controls (26%) and 282 have an unknown status (49%). More details are given 
in Table |2] The four studied adverse events occur 495 (AMI), 4746 (GIB), 10910 (ALI) and 5234 (AKI) 
times in the French pharmacovigilance database. 
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Table 2 Numbers of positive, negative and unknown signals for the four adverse event in the OMOP reference set 
and for the 145 drugs presented in both databases (OMOP and French pharmacovigilance). 


control 

AMI 

GIB 

ALI 

AKI 

positive 

29 

20 

75 

21 

negative 

43 

46 

22 

42 

unknown 

73 

79 

48 

82 


4.2 Competing methods 

Disproportionality-based methods We chose to compare our method with all the disproportionality 
methods implemented in the R package PliViD (lAhmed and Poncetl 2013) Thus, four disproportionality- 
based methods: the Proportional Reporting Ratio (PRR), the Reporting Odds Ratio (ROR), the Reporting 
Fisher Exact Tetd (RFET) (Ahmed etal, 20ld) and the FDR-based Gamma Poisson Shrinkage (GPS) 
( Ahmed et al . . 200 % are considered. The specific statistics are used with a threshold of 0.05 and are 
presented in Table [3] All methods are compared on the 580 drug-event pairs mentioned on the OMOP 
reference set. 


Table 3 Specific statistics of the disproportionality methods: statistics (Stat), minimal number of individuals having 
a drug-event pair to claim this pair as a signal (Min.) and reference (Ref.). 


Method 

Stat. 

Min. 

Ref. 

PRR 

p-value of rank 

3 

Evans et al. (2001) 

ROR 

p-value of rank 

3 

Van Puiienbroek et al. (2002) 

RFET 

mid-pvalue 

1 

Ahmed et al. (2010) 

GPS 

prob of HO 

1 

Ahmed et al. (2009) 


Lasso-based logistic regressions The result^ of the Lasso method applied on logistic regressions are ob¬ 
tained with the R package elmnetA Friedman et al . . 2010 ). The penalty value is selected by cross-validation 
with ten folds to obtain the most parsimonious model among the models having best misclassification er¬ 
ror. This method permits to find few signals since the selected penalty implies that only the intercept is not 
zero for only one adverse event (AMI). This example shows the difficulty for calibrating the Lasso-penalty. 
Indeed, the misclassification error is roughly constant according to the penalty value. This is due to the 
weak rate of notifications for one adverse event. 


Model-based logistic regressions For each of the four adverse events, 100 random initialisations of 
Algorithm [T] with a = 5 and R = 5.10 3 iterations have been done. The model maximizing the BIC 
criterion is returned. Table H] presents the number of competing models for each adverse event, which 
corresponds to the dimension of F defined in ©. 


Table 4 Number of drugs respecting ([5} and number of competing models for each adverse event (|T|). 


Adverse Event AMI GIB 

Number of drugs respecting ([5]) 66 97 

|F| 2 66 2 97 


ALI 

123 

2 123 


AKI 

w 

2 107 
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4.3 Method comparison 

Table [5] presents the rates of positive controls, of negative controls and of unknown signals detected by all 
the competing methods. 


Table 5 Main results obtained by the competing methods ordered by their rate of positive controls: number of signals 
(NS), rate of positive controls (RPC), rate of negative controls (RNC) and rate of unknown signals (RUS). 


Method 

NS 

RPC 

RNC 

RUS 

Logistic BIC (AlgorithmljJ 

70 

0.54 

0.01 

0.45 

RFET 

114 

0.51 

0.06 

0.43 

PRR 

73 

0.51 

0.10 

0.40 

ROR 

120 

0.50 

0.07 

0.43 

GPS 

129 

0.48 

0.07 

0.45 

Lasso-CV 

13 

0.46 

0.08 

0.46 


The proposed method obtains the best rates of positive controls and negative controls. It detects 70 
signals while the Lasso-based method finds only 13 couples. The poor results of the Lasso are explained 
by the penalty values assessed by the misclassification error rate. Indeed, the resulting penalty values 
constrain all the coefficients to be zero for three adverse events. All the disproportionality methods obtain 
similar results. Despite that many signals are detected by these methods (between 73 and 129), their rates 
of positive and negative controls are worse than those resulting from the proposed method. 

Since the proposed method obtains the best rates of positive and negative controls, we conclude that it is 
more precise for the signal detection. However, it finds less signals than the disproportionality methods. So, 
it permits the practitioner to focus on more probably related drug-event pairs. Moreover, some associations 
detected only by the disproportionality method could be due to the co-prescription phenomenon. 

4.4 Specific comments about the proposed method 

Table [ 6 ] indicates the computing time obtained by an Intel(R) Xeon(R) CPU 3.00 GHz and the number of 
times where the Algorithm[T|finds the best model. 

Table 6 General results of Algorithm!]] number of times where 7 * has been found (model), number of signals (nb 
signals), number of positive controls, number of negative controls, computing time in minutes required for one Markov 
chain realization (time) and number of unique profiles for the best model (m y *). 


Adverse Event 

AMI 

GIB 

ALI 

AKI 

model 

100 

67 

50 

56 

nb signals 

9 

10 

26 

25 

positive controls 

1 

5 

20 

12 

negative controls 

1 

0 

0 

0 

time 

1 

3 

3 

5 

m-y* 

45 

629 

554 

1024 


The computing time has been strongly reduced by using the expression of the log-likelihood given in 
Appendix A. For example, by considering the best model resulting of the adverse event AMI where 9 vari¬ 
ables have a non-zero coefficient, the database can be reduced to m-y* = 45 unique weighted individuals 
(see Appendix A). Moreover, since many different initialisations allow to find 7 *, the number of initial¬ 
isations (set at 100 during the experiment) could be reduced. Finally, the list of the detected signals are 
presented in Appendix B. 
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4.5 Specific comments about the Lasso 


We have seen that the Lasso obtains poor results when the penalty is determined according to the mis- 
classification error. Caster eial\ ( 201 C)t ) suggest to set the same penalty value for all the adverse events. 
Moreover, they use a disproportionality measure to evaluate the number of signals and thus to deduce the 
penalty value. 


In order to investigate the Lasso approach features, we build a sequence of penalties to obtain different 
numbers of signals with the Lasso. The numbers of positive and negative controls resulting for each penalty 
values are indicated by the black lines of Figure Q] 


Fig-1 Rates of positive and negative controls obtained by the Lasso with different penalities black curve) and obtained 
by the model maximizing the BIC criterion (red dots). 



Number of signals 



The results related to the model maximizing the BIC criterion are indicated by red dots. On FigureQ] it is 
very hard to find a penalty value from where results obtained a better trade off between the positive and the 
negative controls. If, for the same number of signals (70) as obtained by Algorithm!]] the Lasso approach 
presents slightly better performances, the corresponding penalty value does not result from an optimizing 
procedure. These figures can not be plotted in reality, since the nature of the signals are unknown. Thus, 
it seems more efficient to select the model maximizing the BIC criterion than to use a Lasso regression. 
Indeed, the penalty calibration is very difficult and the results related to the ’’best” penalty value are similar 
to those related to the model maximizing the BIC criterion. Moreover, this penalty value is not accessible 
in practice. 
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5 Discussion 


In this paper, we have proposed a method for analysing individual spontaneous reporting databases, which 
also avoids the drawbacks of the disproportionality-based measures (co-prescription and masking effects). 
The signal detection is led throughout parsimonious logistic regressions whose sparsity degree is assessed 
as a model selection challenge. Therefore, we avoid the use of Lasso-type method that requires the chal¬ 
lenging calibration of penalty. The combinatorial problem of model selection is bypassed by Metropolis- 
Hastings binary space sampling. 


Despite to the difficulties for evaluating pharmacovigilance methods, the OMOP reference set of Ryan et al. 
( 20131) gives us the opportunity to compare the proposed method to the reference approaches on real data. 

On these data, it appears to be relevant for the signal detection issue. However, many signals are not de¬ 
tected by our method. So, we conclude that this method should be used in parallel to existing measures in 
pharmacovigilance. 

The proposed approach can manage the whole French pharmacovigilance database which consists of 
n = 219, 340 individual notifications, p = 2,114 drugs and d = 4, 257 adverse events. We have shown 
that the dimension of the model space is defined by the number of drugs verifying (0. Figure [2] presents 
the evolution of this number according to the headcount of the adverse events. 

Fig. 2 Evolution of the number of drugs verifying <0 according to the headcount of the adverse event. 



o o 

. & °° 
6 «p o°o° 
oO 

Od* 


I I I 

2000 4000 6000 8000 

Headcount of the adverse event 


In the whole database, 75% of the adverse events can be associated to less than 42 drugs. For the adverse 
events which have less than 12 drugs verifying (0, we advise to use an exhaustive approach consisting of 
computing the BIC criterion for each competing models in T. The model selection on the whole French 
pharmacovigilance database is achieved at the cost of several days of computing time. The proposed 
approach can thus be used to investigate targeted adverse events. Finally, a preliminary drug selection 
could provide a reducing of computing time. 
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A Weighted form of the adverse event log-likelihood 

Obviously, the coordinates of Xi impacting the log-likelihood value are those belonging to 'D~ r For each 
observation Xi, we denote by xj £ where |-y| = YJj =1 7 j> the vector containing the elements of Xi 

impacting the log-likelihood (i.e. the vector composed with the elements of x, such as index belongs to 
T>. y). Thus, for each j = 1,..., |-y |: 



(14) 


Moreover, many individual profiles {xj , y,) occur many times in the database. We denote by m~f the 
number of different profiles impacting the log-likelihood of model 7 . The profile i is denoted by {xf yj) 
and its weight is denoted by wj. Thus, © is given by 



M 


M 



(15) 
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where /3j is the j-th element which is not zero in (3, so for each j = 1,..., |'y|: 


P] = Pjo with Jo = min 


f ■ = J 

j "=i 


(16) 


In practice, it is often more numerically efficient to compute the adverse event log-likelihood by us¬ 
ing (fBT ) than by using ([3]). 


B Signals detected by the proposed methods 

Table [7] presents the couples between a drug and an adverse events detected by the proposed method. 
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M. Marbac, P. Tubert-Bitter, and M. Sedki: 


Table 7 List of the signals detected by the proposed method. 


Adverse event 

ATC 

Headcount 

p t 

Omop control 

AMI 

L03AB07 

7 

2.92 

unknown 

ALI 

J05AE09 

36 

2.85 

positive 

AMI 

N02CC03 

6 

2.6 

positive 

ALI 

L01BB03 

8 

2.51 

positive 

AKI 

M01AE09 

46 

2.27 

unknown 

ALI 

C02KX01 

65 

2.25 

positive 

AMI 

M01AH01 

21 

2.05 

unknown 

AMI 

L01BC05 

10 

1.85 

unknown 

GIB 

M01AC01 

138 

1.84 

positive 

AMI 

J05AF05 

92 

1.84 

unknown 

GIB 

B01AC04 

523 

1.77 

positive 

AKI 

J05AF07 

144 

1.76 

unknown 

GIB 

B01AC07 

31 

1.67 

unknown 

AKI 

C09AA05 

353 

1.66 

unknown 

AKI 

C09AA03 

165 

1.66 

positive 

AKI 

C09CA08 

35 

1.65 

positive 

ALI 

L02BB01 

10 

1.61 

positive 

ALI 

J05AG01 

297 

1.55 

positive 

ALI 

J02AC03 

117 

1.54 

positive 

AKI 

C09AA02 

146 

1.51 

positive 

GIB 

M01AE03 

276 

1.5 

positive 

AKI 

C09AA10 

38 

1.49 

unknown 

AKI 

N05AD08 

10 

1.48 

unknown 

AKI 

L04AD01 

91 

1.38 

positive 

GIB 

M01AE02 

52 

1.34 

positive 

ALI 

J01XE01 

52 

1.33 

positive 

ALI 

J04AB02 

538 

1.31 

positive 

AKI 

C09CA07 

34 

1.31 

positive 

AKI 

M01AE03 

250 

1.31 

positive 

AMI 

L04AB02 

13 

1.29 

unknown 

AKI 

L01BA01 

129 

1.28 

unknown 

ALI 

A03AX13 

26 

1.25 

unknown 

ALI 

A07EC01 

71 

1.24 

unknown 

AKI 

L01BC05 

57 

1.18 

unknown 

AKI 

C09CA06 

139 

1.16 

positive 

GIB 

M01AH01 

98 

1.15 

unknown 

ALI 

J02AC02 

22 

1.15 

positive 

AMI 

B01AC04 

24 

1.14 

unknown 

ALI 

J04AC01 

359 

1.08 

positive 

AKI 

C09AA06 

25 

1.06 

unknown 

AKI 

C09AA01 

61 

1.02 

positive 

ALI 

N03AF01 

248 

0.99 

positive 

AKI 

M01AE02 

43 

0.99 

positive 

ALI 

D01AE15 

77 

0.98 

positive 

AMI 

J05AF02 

30 

0.98 

negative 

ALI 

L03AB07 

27 

0.98 

positive 

ALI 

J02AC01 

188 

0.97 

positive 

ALI 

G03CA03 

76 

0.96 

unknown 

AKI 

J04AB02 

104 

0.96 

unknown 

GIB 

A12BA01 

155 

0.94 

positive 

AKI 

J01MA02 

147 

0.87 

unknown 

AMI 

J05AF06 

44 

0.83 

unknown 

ALI 

L01BA01 

186 

0.81 

positive 

AKI 

M04AA01 

220 

0.77 

positive 

AKI 

C03AA03 

430 

0.73 

positive 

AKI 

M01AH01 

72 

0.68 

unknown 

GIB 

C08DB01 

81 

0.63 

unknown 

AKI 

A12BA01 

154 

0.62 

unknown 

ALI 

A10BF01 

36 

0.61 

unknown 

AKI 

M01AC01 

47 

0.59 

positive 

ALI 

N03AG01 

298 

0.56 

positive 

ALI 

J01MA06 

60 

0.54 

positive 

ALI 

N05BA05 

147 

0.53 

unknown 

ALI 

J05AF07 

177 

0.49 

positive 

ALI 

M04AA01 

216 

0.45 

positive 

AKI 

J01MA01 

109 

0.43 

unknown 

GIB 

C08CA01 

126 

0.37 

unknown 

ALI 

N06AB04 

117 

0.36 

unknown 

GIB 

C09AA05 

148 

0.35 

unknown 

ALI 

M01AE03 

200 

0.31 

unknown 

ALI 

J01MA02 

202 

0.31 

positive 
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